Retrieval of Personal Web Pages Based on Web Page Clustering
نویسندگان
چکیده
منابع مشابه
Structure-Based Web Pages Clustering
Recognizing similarities among the documents of a set is one of the objectives of retrieving information. The information related to the similarities of web pages can be used to present similar documents to users in order to retrieve considered information. In the present study, a new algorithm has been proposed to cluster web pages based on their structure. The proposed algorithm is based on h...
متن کاملClustering Web pages based on their structure
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., programs that extract data from HTML pages, and transform them into a more structured format, typically in XML. These techniques automatically induce a wrapper from a set of sample pages that share a common HTML template. An open issue, however, is how to collect suitable classes of sample pages to feed...
متن کاملWeb Page Ranking Based on Text Substance of Linked Pages
World Wide Web is large sized repository of interlinked hypertext documents accessed via the Internet. Web may contain text, images, video, and other multimedia data. The user navigates through this using hyperlink. Search Engine gives millions of results and applies Web mining techniques to order the results. The sorted order of search results is obtained by applying some special algorithms ca...
متن کاملSelf-organizing map based web pages clustering using web logs
A Web-based business always wants to have the ability to track users’ browsing behavior history. This ability can be achieved by using Web log mining technologies. In this paper, we introduce a Self-Organizing Map (SOM) based approach to mining Web log data. The SOM network maps the web pages into a two-dimensional map based on the users’ browsing history. Web pages with the similar browsing pa...
متن کاملClustering-Based Relevance Feedback for Web Pages
Most of traditional relevance feedback systems simply choose top ranked Web pages for a query as the source of providing the weights of candidate query expansion terms for the query. However, the whole contents of such top-ranked Web pages are usually mixed with sub-topically distinguishable contents that are too heterogeneous to be directly used to extract good quality candidate query expansio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Japan Society for Fuzzy Theory and Intelligent Informatics
سال: 2006
ISSN: 1881-7203,1347-7986
DOI: 10.3156/jsoft.18.161